High-quality Training Data Selection using Latent Topics for Graph-based Semi-supervised Learning

نویسندگان

  • Akiko Eriguchi
  • Ichiro Kobayashi
چکیده

In a multi-class document categorization using graph-based semi-supervised learning (GBSSL), it is essential to construct a proper graph expressing the relation among nodes and to use a reasonable categorization algorithm. Furthermore, it is also important to provide high-quality correct data as training data. In this context, we propose a method to construct a similarity graph by employing both surface information and latent information to express similarity between nodes and a method to select high-quality training data for GBSSL by means of the PageRank algorithm. Experimenting on Reuters21578 corpus, we have confirmed that our proposed methods work well for raising the accuracy of a multi-class document categorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models

Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly balances two goals: recovery of faithful generative explanations of high-dimensional data, and accurate prediction of associated semantic labels. Existing app...

متن کامل

Supervised and Semisupervised Clustering Based on Feature Selection Algorithm Process

In clustering process, semi-supervised learning is a tutorial of contrivance learning methods that make usage of both labeled and unlabeled data for training characteristically a trifling quantity of labeled data with a great quantity of unlabeled data. Semi-supervised learning cascades in the middle of unsupervised learning (without any labeled training data) and supervised learning (with comp...

متن کامل

Feature Selection Algorithm for Supervised and Semisupervised Clustering

−In clustering process, semi-supervised learning is a tutorial of contrivance learning methods that make usage of both labeled and unlabeled data for training characteristically a trifling quantity of labeled data with a great quantity of unlabeled data. Semi-supervised learning cascades in the middle of unsupervised learning (without any labeled training data) and supervised learning (with com...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013